Computational Morphologies for Small Uralic Languages

نویسندگان

  • GÁBOR PRÓSZÉKY
  • ATTILA NOVÁK
چکیده

This article presents a set of morphological tools for small Uralic languages. Various Hungarian research groups specialized in Finno-Ugric linguistics and a Hungarian language technology company (MorphoLogic) have initiated a project with the goal of producing annotated electronic corpora for small Uralic languages. The languages described include Mordvin, Udmurt (Votyak), Komi (Zyryan), Mansi (Vogul), Khanty (Ostyak), Tundra Nenets (Yurak) and Nganasan (Tavgi). Most of these languages are endangered, some of them are on the verge of extinction, so their documentation is an urgent scientific task. The most important subgoal of the project was to create morphological analyzers for the languages involved.1 In the project, we used the morphological analyzer engine called Humor (’High speed Unification MORphology’) developed at MorphoLogic (Prószéky and Kis (1999)), which had been first successfully applied to another Uralic (Finno-Ugric) language, Hungarian, and later to various Slavic, Germanic and Romance languages. We supplemented the analyzer with two additional tools: a lemmatizer and a morphological generator. We present the tools through their application to the Komi language, specifically to the standard Komi-Zyryan dialect. Creating analyzers for the two Samoyed languages involved in the project,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Tools for Six Small Uralic Languages

This article presents a set of morphological tools for six small endangered minority languages belonging to the Uralic language family, Udmurt, Komi, Eastern Mari, Northern Mansi, Tundra Nenets and Nganasan. Following an introduction to the languages, the two sets of tools used in the project (MorphoLogic’s Humor tools and the Xerox Finite State Tool) are described and compared. The article is ...

متن کامل

Tracking Typological Traits of Uralic Languages in Distributed Language Representations

Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successe...

متن کامل

The developments, uses, and functions of preverbal particles in Hungarian and other Uralic languages

Within the Uralic language family, preverbal particles generally only occur within the Ugric branch of the Finno-Ugric languages, a fact known for some time (cf. Zsirai, 1933). Most Uralic scholars (who assume the existence of proto-Uralic) assume that preverbal particles are not a Uralic feature, that is, the existence of these particles in a handful of Uralic languages is due to innovations i...

متن کامل

Matti MiestaMo (Helsinki) POLAR INTERROGATIVES IN URALIC LANGUAGES A TYPOLOGICAL PERSPECTIVE

The paper surveys the domain of polar interrogation in the Uralic language family in a typological perspective. An overview of the ways in which polar interrogation is marked in the world’s languages is presented and the encoding of the domain in Uralic languages is examined against this background. All the major types of polar interrogative marking are found in the family. Polar interrogatives...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005